Every AI output should be evaluated against the approved visual references: mood board, reference reel, or creative brief. Color palette, lighting quality, compositional style, motion language, and technical integrity across frames. This is not subjective: either the output matches the approved visual language or it doesn't.
Every asset has non-negotiable anchors: product accuracy, talent likeness, brand tone, licensed IP, legal constraints. These must be evaluated separately from visual quality. A beautiful shot that misrepresents a product or drifts from a brand standard is not production-ready regardless of how well it matches the references.
Generative work must also be screened for IP it was never meant to include. Models are trained on vast catalogs of copyrighted material and will surface logos, characters, likenesses, music cues, or brand elements that were never licensed for the project. If an output unintentionally incorporates IP that isn't cleared for use, it fails Anchor Fidelity, regardless of how well it satisfies every other criterion.
Generative outputs can be visually correct and emotionally wrong. The test: show the output to someone unfamiliar with the brief and ask what emotion or message it conveys. If their read matches the brief's intended tone and audience, it passes. If it doesn't, no amount of visual polish makes it production-ready. This is the layer most evaluation frameworks skip, and the one most directly connected to resonance.
Generative loops are not free. Every pass consumes compute, time, and creative bandwidth. The evaluation framework has to define not just pass/fail criteria but iteration limits: when to re-prompt, when to escalate to human intervention, and when to abandon the AI path entirely for a given element.
The evaluation framework is only as useful as the feedback loop it feeds. Every correction, escalation, or rejection is information. The goal isn't just to QC individual assets. It's to build the dataset of human creative judgment that makes the next generation of outputs more resonant, and to keep the humans who understand what resonance actually means firmly in the loop.